专利摘要:
Method for detecting events in a data stream comprising at least the following steps: • defining a domain of votes decomposed into I intervals [Di, Di + 1 [or subdomains, • initializing an array S containing values of scores S [e] [t] corresponding to an event "e" located in t, • detecting one or more micro-events mn in the data flow localized at t1, .. tN, (20) and corresponding to an observation On, • for each micro-event mn, for each event "e" possible, for each interval i, [Di, Di + 1 [, and for all the locations d contained in an interval [Di, Di + 1 [, update the table S [e] [t] of the variables "e" by adding the same confidence value, for a micro-event mn and for an interval i, and for each location t1, .. tN, • to determine a set of events Probably associated with local observation from Table S [e] [t], (32).
公开号:FR3026526A1
申请号:FR1459143
申请日:2014-09-26
公开日:2016-04-01
发明作者:Adrien Chan-Hon-Tong;Laurent Lucat
申请人:Commissariat a lEnergie Atomique CEA;Commissariat a lEnergie Atomique et aux Energies Alternatives CEA;
IPC主号:
专利说明:

[0001] The invention relates to a method and a system for the detection of events. It is used to detect events of a known nature in real-time or recorded data streams. It applies, for example, to the detection of activity of a listed nature of a person, actions of the everyday life of a person of interest, the detection of emotions (of a listed nature) of a person of interest, the presence of an object of known nature in a scene, a sound event of known nature, by analysis of a signal typically a video. It also applies to events of the seismography / geological type of known nature. The field of detection and recognition of events such as human activity has become an important focus of research. The ability to detect human activities can lead to the development of applications in different areas, for example, medical video protection, etc. The systems known from the prior art involve the detection of micro-events that are characteristic of one or more different events, and the weighted accumulation in time or space of these micro-events preferably highlights a event, then another and so on. The detection of a micro-event generates weighted votes for each event, at each temporal / spatial location. In existing systems, the vote of a micro-event follows any distribution having a different weight in the time / space domain. This therefore requires a significant amount of memory and operations. Figure 1 is a block diagram of detection method according to the prior art. For each observation 0; we associate at a given instant to, 10, a vote or confidence value V; 0, 11, which will be used to determine the associated event, 13 in to. The generic detection approach known from the prior art requires a rather large memory size. Assuming that each of the values of the function representative of the score v is stored on a "double", that is to say on 8 bytes, then, according to the generic method of the prior art, the quantity necessary to store v is the product of 8 par, the number of interesting events (E), the number of detectable micro-events (M), the time domain of the votes (2D + 1), that is to say E * M * ( 2D + 1), where D is the half-size of the time domain on which we look at the observation. Moreover, the propagation of information requires E * (2D + 1) addition and reading operations of "doubles", the reading of a table v [E] [mn] [d], where d corresponds to a moment of relative observation and addition to an event table S [e] [tn + d] by micro-event. For example, if E = 30 (there are 30 possible different events), M = 1000 different possible micro-events, D = 100 (equivalent of 5 seconds at 25 frames per second), the memory required for the generic detector is 60 MB and the number of read operations equal to 15 000. In the case of methods that use more micro-events, the storage memory must be able to store the equivalent of 1 GB of votes, which is generally not very compatible with a standard microcontroller. One of the known approaches to reduce the memory and the computations required is to compress the array v (v [E] [mn] [d]), for example, by the method described in the document entitled "best piecewise constant approximation of Hiroshi Konno et al, Operations Research Letters, volume 7, number 4, august 1988. This approach leads to a blind modification of the votes. Indeed, the votes are intermediaries between micro-events and segmentation. Changing the votes without regard to the effect on the segmentation amounts to making changes to the blind and can lead to a significant increase in error even for small changes. The document entitled "A Hough Transform-Based Voting Framework for Action Recognition" by Angela Yao et al, Conference Computer Vision and Pattern Recognition, June 13-18, 2010, IEEE, pages 2061-2068 describes a method for classifying actions or observations contained in a video sequence. Adrien Chan-Hong-Tong's 2014 document, Adrien Chan-Hong-Tong, "Simultaneous segmentation and classification for human actions in a video stream," describes an action detection method that takes into account the temporal location of actions. Documents US 2012/0128045 and US 5,430,810 describe line detection methods in images and relate to the optimized management of the intermediate results. The methods described in the prior art known to the applicant require a memory size large enough to process the data. One of the objectives of the present invention is to provide a method and a system for reducing the memory size necessary for the storage of all the votes, and to reduce the number of processing operations without impairing the performance level of the system. detector (that is to say, remaining compatible with learning deemed effective literature). In the remainder of the description, the word "vote" is associated with an elementary belief that a micro-event is associated with a localized event, and the word "score" of an event is associated with a value of confidence or probability that the event corresponds to the observation made on a signal. The score of the event E is the sum of all the votes of the micro-events in favor of this event E.
[0002] The idea of the present invention is in particular to improve the event detection systems by considering time intervals for which a constant score value is assigned. The invention relates to a method for detecting events in a data stream, the method comprising at least the following steps: defining a voting domain in which one or more possible events associated with an observation are searched in the decomposed data stream in I intervals [Di, Di + 1 [or subdomains, - initialize an array S containing score values S [e] [t] corresponding to an event "e" located at t, - detecting one or more micro-events mn in the data stream, located in ti, .. tN, (20) and corresponding to an observation On, - for each micro-event mn, for each event "e" possible, for each interval i, [Di, Di +1 [, and for all the locations d contained in an interval [Di, Di + 1 [, update the table S [e] [t] of the variables "e" corresponding to an event using the same value v of trust, for example by adding value, for a micro-ev mn event and for an interval i, and for each localized data in ti, .. tN, - determine a set of probable events associated with a local observation in the data flow from table S [e] [t], (32). The locations ti, .. tN, are, for example, time instants.
[0003] According to an alternative embodiment, the values of micro-event detection instants are chosen as multiples of a given "increment" value. The method takes into account a data stream of given size T.
[0004] According to another variant embodiment, the data stream is processed continuously. The intervals can be defined using the specific instants associated with the maximums obtained by the implementation of the following steps: PrecalculentGain (D, I) if I = 0 return 0 if D = 0 return 1 return the max on D 'of (2D '+1) (D-D' + 1) 2 + PrecalculationGain (D'-1, I-1) GAIN (D, I) is D such that (2D '+ 1) (D-D' + 1) 2 + Precalculation (D'-1, l-1) = Precalculation (D, I) display of GAIN (D'-1, I-1) The invention also relates to a system for detecting events in a data stream comprising at least the following elements: a sensor adapted to detect one or more micro-events mn in the data stream, at given locations ti, ..tN, corresponding to an observation 0i, - a storage memory of the results, a processor adapted to perform the steps of the method according to the invention. The system may include a buffer for storing the stream to be analyzed, in the case where the data stream is a continuous stream. Other characteristics and advantages of the present invention will appear better on reading the following description of exemplary embodiments given by way of illustration and in no way limitatively appended to the figures which represent: FIG. 1, a diagram of the process according to prior art, - Figure 2, the principle implemented by the method according to the invention, - Figure 3, an example of detector architecture according to the invention, - Figure 4, an illustration of a first variant. of the method, and - Figure 5, an example of construction of time intervals.
[0005] The following detailed example is given as an illustration for the detection of an action of a person from data contained in a video stream of a given size or arriving continuously on a detection sensor or an image. The action of a person covers any type of event whose nature is known. The following description may be implemented, without departing from the scope of the invention and possibly with some adaptations, to any type of events listed during a prior learning phase, for example. Each video is seen as a succession of images / moments. At each moment corresponds an image and vice versa. The method according to the invention makes use of statistical learning. During this learning, each image of each video is provided with an annotation indicating what is the event associated with the image. This annotation is given, for example, by an expert who can use a series of images around the image in question, or even all the information contained in the video. The learning consists in parameterizing the detector so that it consistently produces outputs often equivalent to the annotated data. The method and system for its implementation will seek to detect each image of the video or what events seem most in line with an observation.
[0006] FIG. 2 represents an exemplary architecture of a system allowing the implementation of the steps of the method according to the invention. The system comprises a processor 20, a memory 21 adapted to store both the software and the voting values obtained by the execution of the method and which will be read to determine the probable event. The system also comprises a sensor 22 for detecting the signal or video connected to the processor, an output 24 connected for example to a display system 25 representing the event selected and associated with an observation. The system may also include a buffer or "buffer" 26 for storing the stream to be analyzed when performing a continuous analysis of a data stream.
[0007] According to a first embodiment illustrated in FIG. 3, the method and its system will update the score for each type of event. For this, the method extracts a set of micro-events located in the video. Each micro-event increases by a certain amount the score of each event (that is, "vote" for each event) in a temporal or spatial neighborhood of its extraction image. Once all nearby micro events have increased the event scores for that moment, a set of relevant events is selected to characterize the present and past observation. For example, events whose score value exceeds a threshold value are retained, or Q events with Q highest scores (singularly the best if 0 = 1). The data and parameters used for the implementation of the method illustrated in FIG. 3 are given below: m1,..., MN, a micro event, with N micro events detected at instants ti, ..., tN, T the size of the video, E the number of events whose potential presence is sought, D is the time domain of the votes, -D = D1 <D2 <... G Di = D a decomposition of the time domain into I intervals , d corresponds to a moment in the interval [Di, Di + 1 ["Detection" is the set of predictions returned (1 per image). The method may comprise the following steps: Initialize an array S of size T times E: S [e] [t] = 0; the table S contains the values of the variables e corresponding to an event, t a given instant of observation, For n = 1 to N For e from 1 to E, (loop on the possible events), For i from 1 to I, (loop over the time intervals), For d from Di (included) to Di + 1 (excluded), (loop on the instants d of observations contained in the time interval), One updates the table S [e] [t] by adding a value v of confidence or score, constant over the time interval [Di, Di + 1 [, for a micro-event mn and a time interval i, 31, S [e] [tn + d] = S [e] [tn + d] + v [e] [mn] [i]; where "=" means "is replaced by the value", For t = 1 to T, we will determine the characteristic event, for example the most probable event by searching for the maximum value on e, 32, Detection [t] = arg max on e of S [e] [t], which gives the most likely characteristic event associated with an observation Oi of the video. All voting values associated with a pair {micro event; event} are imposed and identical for the instants contained in the time interval [Di, Di + 1 [and are stored in a single value. The number of values to be stored decreases by E * M * (2D + 1) according to the methods known from the prior art to E * M * I. Likewise, the number of read operations in the memory is reduced. According to another embodiment illustrated in FIG. 4, the method assumes that the instants on which the observations Oi are made are a multiple of an "increment" value better known in the technical field by the English term "step" given. The parameters and input data are as follows: m1, ..., r1N, N micro events detected at instants ti, ..., tN, assuming that the value of the instants chosen for the observation tn is multiple of " step ", T the size of the video, 25 E is the number of types of events of interest to the user, D is the time domain of the votes, -D = D1 <D2 <... G DI = D a decomposition of the time domain such that for all i, Di and D are multiple of "step", "Detection" is the set of predictions returned (one per image). The method "INTERVALLES_STEP" then comprises the following steps: Initialize an array S of size T / step times E: S [e] [t] = 0; For n = 1 to N, step 40, For e from 1 to E For i from 1 to I For j from Di (inclusive) to Di + 1 (excluded) by step jump, steps 41, 42 tau = (t, + j) / step, multiple instant of step, S [e] [tau] = S [e] [tau] + v [e] [m,] [i]; step 43, for t = 1 to T tau = integer part (t / step); Prediction [t] = arg max on e of S [e] [tau]; the event is detected for an observation, step 44. This variant embodiment makes it possible to divide the number of additions of doubles by "step". In the case where "increment" = 4, this equates to a 75% reduction in the number of "doubles" additions. The method may be implemented to detect events or actions in a continuously observed video data stream. The implemented algorithm is then an infinite loop, since the video arrives infinitely, and uses an "Extract" command the next micro-event to obtain the next extracted word (independent processes).
[0008] The parameters and input data of the process are as follows: E is the number of elected or chosen events, D is the time domain of the votes, -D = D1 <D2 <... G Di = D a domain decomposition time, "Detection" is the set of predictions returned (1 per image) B a buffer size (typically 5 * D), the method "INTERVALLE_FLUX" includes for example the following steps: Initialize an array S size B times E : S [e] [t] = 0; t = -D Loop Extract the following micro event m in t 'As long as t <f-D Detection [t] = arg max on e of S [e] [r / oB]; with tcloB denoting the remainder of the Euclidean division of t by B S [e] [t% B] = 0; t = t + 1; For e from 1 to E For i from 1 to I For d from Di (inclusive) to Di + 1 (excluded) S [e] [t '+ d] = S [e] [t' + d] + v [ e] [m] [i]. According to another embodiment, the method makes the assumption that the time intervals correspond to those that will maximize the expectation of a certain gain, as explained below. The parameters and input data of this variant are: m1, ..., mN, N micro events detected at times ti, ..., tN, T the size of the video, E is the number of type d events we are interested in, D is the time domain of the votes, -D = D1 <D2 G ... G Di = D a decomposition of the time domain assuming that the Di's are exactly the intervals that maximize the expectation of this certain gain, "Detection" is the set of predictions returned (1 per image) The method "INTERVALLE_GAIN" then comprises the following steps: Initialize an array S of size T times E: S [e] [t] = 0; For n = 1 to N For e from 1 to E For i from 1 to I For d from Di (inclusive) to Di + 1 (excluded) S [e] [tn + d] = S [e] [tn + d ] + v [e] [mn] [i]; For t = 1 at T Detection [t] = arg max on e of S [e] [t]; To obtain the intervals that maximize the expectation of this certain gain (which depend only on D and I the number of intervals desired), we can proceed as follows: PrecalculGain (D, I) if I = 0 return 0 if D = 0 to return 1 to return the max on D 'of (2D' + 1) (D-D '+ 1) 2 + PrecalculGain (D'-1, l-1) GAIN (D, I) is D' such that (2D '+ 1) (D-D' + 1) 2+ Precalculation (D'-1, l-1) = Precalculation (D, I) display of GAIN (D'-1, I-1) . This decomposition is interesting in the context of a method proceeding by vote with constant votes per piece because the result of the function "PrecalculentGain" is interpretable as the amount of individual vote available during learning (in expectation).
[0009] Without departing from the scope of the invention, the main steps of the method according to the invention apply by adapting them if necessary by working in the spatial field by locating micro-events in space, by defining a spatial score domain. , and initializing the table S in space. Spatial and not temporal locations will then be considered for the execution of the steps. One of the advantages of the method according to the invention is that, whatever the temporal or spatial domain chosen (for example [-D, D]) and whatever the intervals of chosen decompositions (D1, ..., DI), it it is always possible to carry out the known learnings of the literature. During the learning phase, the method can consider all the time intervals comprising 0 built on the decomposition: that is to say the set of [Di, Dj [with Di <O <Dj. An additional processing consisting in recombining these intervals then makes it possible to obtain the desired voting value on each interval [Di, Di + 1 [. This recombination consists in associating with an interval [Di, Di + 1 [the sum of the variables associated with the intervals [Di, Dj [the container. The different variants of implementation of the method according to the invention described above can be used combined with each other in order to improve the detection of events. In the case where an event considered is an action and an observation is the skeleton of a person, a possible system for implementing the method according to the invention comprises, for example, a computer on which is connected an active camera. For example, the processor 20 of the computer performs three tasks in parallel: extracting the skeleton of a person observed frame by frame, and forming the trajectory of each articulation; extracting from these trajectories the positions, the speeds and the sequences of size positions, whenever possible, or all the "step" images. Each of this information is written by a vector according to a method known to those skilled in the art. Each of these vectors is then associated with their nearest neighbor among a bank of vectors specific to the articulation and the type of information (position, speed or sequence). As a result, each information is associated with a symbol corresponding to a micro-event. - Use micro-events extracted and learned votes to determine the action performed by the person in the image, the event, according to one of the variants of the method according to the invention described above. During training, a body of annotated videos is used. Annotations define which actions will be searched for.
[0010] Typically, the actions taken can be actions of everyday life: drinking, eating, reading, resting, cooking, taking medicine, etc. A voting domain is decided based on the maximum number of consecutive images associated with the same action in a video. Then, we choose a number of time intervals to decompose the domain of votes. The intervals Di associated with the "GAIN" function presented above are then determined. This is done using the Deeply Optimized Hough Transform (DOHT) or any other statistical learning method. The use of these votes with the method according to the invention makes it possible to keep a small error while using only few resources, little memory, few readings of this memory, and possibly few additions in the case of the variant. process with "downsampling". FIG. 5 illustrates this embodiment in which, for an event, 50, the desired intervals 51, 52 will be chosen and, at the DOHT, 53 will be provided with approximate intervals corresponding to the set of intervals built on the union of ends containing 0. The invention applies in particular to seismographic or sound signals, 2D or 2D video images, 3D images or 3D video. The signals are solid signals in an N-dimensional matrix, or else local, especially in the case of point data associated with a model (such as skeleton of the body, a mesh on a face). The method and the system according to the invention notably offer the advantage of reducing the memory and the calculation time required, while being compatible with a reputation optimization structure that is deemed to be efficient. The invention proposes a structure for the score value that induces a particular voting process, which reduces to E * M * I, the number of values to be stored, with I much smaller than (2D + 1), and only E * I, the number of read operations per word. Compared to the example given in the prior art, the resource required is decreased to 3 MB instead of 60 and the number of operations to 10,000 instead of 15 000. Moreover, this structure is compatible with selections effective voting. It therefore leads to informed voting changes whose relevance is superior to those of the blind modifications of the prior art.
权利要求:
Claims (4)
[0001]
CLAIMS1 - Method for detecting events in a data stream comprising at least the following steps: - defining a voting domain in which one or more possible events associated with an observation are searched in the data stream decomposed into I intervals [Di , Di + 1 [or sub domains, - initialize an array S containing score values S [e] [t] corresponding to an event "e" located at t, - detect one or more micro-events mn in the stream of data located in (20) and corresponding to an observation On, - for each micro-event mn, for each event "e" possible, for each interval i, [Di, Di + 1 [, and for all the locations d contained in an interval [Di, Di + 1 [, update the table S [e] [t] of the variables "e" corresponding to an event using the same confidence value, for a micro-event mn and for an interval i , and for each localized data - determining a set of probable events associated with a local observation in the data stream from the table S [e] [t], (32).
[0002]
2 - Process according to claim 1 characterized in that the locations are time instants.
[0003]
3 - Process according to claim 2 characterized in that one chooses the values of micro-event detection times as multiples of a given "increment" value.
[0004]
4 - Process according to claim 2 characterized in that the processed data stream is of size T given. - Method according to claim 2 characterized in that the data stream is continuously processed. 6 - Process according to claim 2 characterized in that the most probable event is determined by searching for the maximum argument on e of the table S [e] [1]. 7 - Method according to one of claims 1 to 6 characterized in that the data stream is in the form of seismographic or sound signals, 2D images or 2D videos, 3D images or 3D video. 8 - System for detecting events in a data stream comprising at least the following elements: a sensor adapted to detect one or more micro-events mr, in the data stream, at given location locations ti, tN, corresponding to an observation 0i, - a processor (20) adapted to perform the steps of the method according to one of claims 1 to 7, - a memory (21) for storing the results obtained by performing the steps. 9 - Detection system according to claim 8 characterized in that it comprises a buffer memory (26) for storing the flow of data arriving continuously. 10 - Detection system according to one of claims 8 and 9 characterized in that it comprises an active camera connected to a microcomputer.
类似技术:
公开号 | 公开日 | 专利标题
Tobiyama et al.2016|Malware detection with deep neural network using process behavior
US10943145B2|2021-03-09|Image processing methods and apparatus, and electronic devices
US10943171B2|2021-03-09|Sparse neural network training optimization
Linardos et al.2019|Simple vs complex temporal recurrences for video saliency prediction
FR3026526A1|2016-04-01|METHOD AND SYSTEM FOR DETECTING EVENTS OF KNOWN NATURE
EP2321769B1|2016-08-31|Method for recognising shapes and system implementing said method
US11132604B2|2021-09-28|Nested machine learning architecture
Demertzis et al.2018|Extreme deep learning in biosecurity: the case of machine hearing for marine species identification
González et al.2019|Automatic plankton quantification using deep features
US11144812B2|2021-10-12|Mixed machine learning architecture
EP2364490A1|2011-09-14|Device with datastream pipeline architecture for recognizing and locating objects in an image by detection window scanning
EP3449423B1|2021-07-21|Device and method for calculating convolution in a convolutional neural network
FR2719384A1|1995-11-03|Object tracking method and device for implementing this method.
EP0447306B1|1996-06-26|Device for recognising sequences in a multidimensional signal
EP2804129A1|2014-11-19|Visual speech-recognition method with selection of the most relevant groups of points of interest
Jiang et al.2021|Weakly supervised discriminative learning with spectral constrained generative adversarial network for hyperspectral anomaly detection
CN111027576B|2020-10-30|Cooperative significance detection method based on cooperative significance generation type countermeasure network
Sinnott et al.2018|A mobile application for dog breed detection and recognition based on deep learning
Krithika et al.2022|MAFONN-EP: a minimal angular feature oriented neural network based emotion prediction system in image processing
WO2020092276A1|2020-05-07|Video recognition using multiple modalities
WO2018197693A1|2018-11-01|Automated method and device capable of providing dynamic perceptive invariance of a space-time event with a view to extracting unified semantic representations therefrom
EP3674741A1|2020-07-01|System and method for identifying a radar source
Plonus et al.2021|Automatic plankton image classification—Can capsules and filters help cope with data set shift?
Oladipo et al.2020|The State of the Art in Machine Learning-Based Digital Forensics
US20210133483A1|2021-05-06|Object detection based on pixel differences
同族专利:
公开号 | 公开日
FR3026526B1|2017-12-08|
US20170293798A1|2017-10-12|
US10296781B2|2019-05-21|
WO2016046336A1|2016-03-31|
EP3198523A1|2017-08-02|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题
US20130259390A1|2008-02-15|2013-10-03|Heather Dunlop|Systems and Methods for Semantically Classifying and Normalizing Shots in Video|
US20110235859A1|2010-03-26|2011-09-29|Kabushiki Kaisha Toshiba|Signal processor|
JPH04287290A|1990-11-20|1992-10-12|Imra America Inc|Hough transformed picture processor|
JP4556195B2|2008-02-15|2010-10-06|カシオ計算機株式会社|Imaging device, moving image playback device, and program|
FR2982684B1|2011-11-10|2014-01-10|Commissariat Energie Atomique|SYSTEM AND METHOD FOR DIGITAL CIRCUIT DESIGN WITH ACTIVITY SENSOR|
US9349069B2|2011-11-21|2016-05-24|Analog Devices, Inc.|Dynamic line-detection system for processors having limited internal memory|US10581945B2|2017-08-28|2020-03-03|Banjo, Inc.|Detecting an event from signal data|
US10313413B2|2017-08-28|2019-06-04|Banjo, Inc.|Detecting events from ingested communication signals|
US11025693B2|2017-08-28|2021-06-01|Banjo, Inc.|Event detection from signal data removing private information|
US10585724B2|2018-04-13|2020-03-10|Banjo, Inc.|Notifying entities of relevant events|
法律状态:
2015-09-30| PLFP| Fee payment|Year of fee payment: 2 |
2016-04-01| PLSC| Publication of the preliminary search report|Effective date: 20160401 |
2016-09-28| PLFP| Fee payment|Year of fee payment: 3 |
2017-09-29| PLFP| Fee payment|Year of fee payment: 4 |
2018-09-28| PLFP| Fee payment|Year of fee payment: 5 |
2019-09-30| PLFP| Fee payment|Year of fee payment: 6 |
2020-09-30| PLFP| Fee payment|Year of fee payment: 7 |
2021-09-30| PLFP| Fee payment|Year of fee payment: 8 |
优先权:
申请号 | 申请日 | 专利标题
FR1459143A|FR3026526B1|2014-09-26|2014-09-26|METHOD AND SYSTEM FOR DETECTING EVENTS OF KNOWN NATURE|FR1459143A| FR3026526B1|2014-09-26|2014-09-26|METHOD AND SYSTEM FOR DETECTING EVENTS OF KNOWN NATURE|
US15/514,384| US10296781B2|2014-09-26|2015-09-24|Method and system for detecting events of a known nature|
EP15767185.0A| EP3198523A1|2014-09-26|2015-09-24|Method and system for detecting known natural events|
PCT/EP2015/072023| WO2016046336A1|2014-09-26|2015-09-24|Method and system for detecting known natural events|
[返回顶部]